AITopics | singular vector

Collaborating Authors

singular vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Neural Information Processing SystemsJun-23-2026, 04:36:06 GMT

Fine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to effectively fine-tune LLMs with an extreme reduction in trainable parameters. But, are their learned solutions really equivalent? We study how LoRA and full-finetuning change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure: weight matrices trained with LoRA have new, high-ranking singular vectors, which we call intruder dimensions, while those trained with full fine-tuning do not. Further, we extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension - by causally intervening on the intruder dimensions by changing their associated singular values post-fine-tuning, we show that they cause forgetting. Moreover, scaling them down significantly improves modeling of the pre-training distribution with a minimal drop in downstream task performance. Given this, we should expect accumulating intruder dimensions to be harmful and lead to more forgetting. This will be amplified during continual learning because of sequentially fine-tuning, and we show that LoRA models do accumulate intruder dimensions here tend to perform worse in this setting, emphasizing the practicality of our findings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.67)

Add feedback

Small Singular Values Matter: ARandom Matrix Analysis of Transformer Models

Neural Information Processing SystemsJun-22-2026, 23:47:01 GMT

This work analyzes singular-value spectra of weight matrices in pretrained transformer models to understand how information is stored at both ends of the spectrum.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Enhancing Graph Classification Robustness with Singular Pooling

Neural Information Processing SystemsJun-22-2026, 19:12:09 GMT

Graph Neural Networks (GNNs) have achieved strong performance across a range of graph representation learning tasks, yet their adversarial robustness in graph classification remains underexplored compared to node classification. While most existing defenses focus on the message-passing component, this work investigates the overlooked role of pooling operations in shaping robustness. We present a theoretical analysis of standard flat pooling methods (sum, average and max), deriving upper bounds on their adversarial risk and identifying their vulnerabilities under different attack scenarios and graph structures. Motivated by these insights, we propose Robust Singular Pooling (RS-Pool), a novel pooling strategy that leverages the dominant singular vector of the node embedding matrix to construct a robust graph-level representation. We theoretically investigate the robustness of RS-Pool and interpret the resulting bound leading to improved understanding of our proposed pooling operator. While our analysis centers on Graph Convolutional Networks (GCNs), RS-Pool is model-agnostic and can be implemented efficiently via power iteration. Empirical results on real-world benchmarks show that RS-Pool provides better robustness than the considered pooling methods when subject to state-of-the-art adversarial attacks while maintaining competitive clean accuracy. Our code is publicly available at: https://github.com/king/rs-pool.

artificial intelligence, machine learning, robustness, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Neurons as Detectors of Coherent Sets in Sensory Dynamics

Neural Information Processing SystemsJun-14-2026, 06:41:48 GMT

From prior experience, neurons learn {\it coherent sets}--regions of stimulus state space whose trajectories evolve cohesively over finite times--and assign membership indices to new stimuli. Coherent sets are identified via spectral clustering of the {\it stochastic Koopman operator (SKO)}, where the sign pattern of a subdominant singular function partitions the state space into minimally coupled regions. For multivariate Ornstein-Uhlenbeck processes, this singular function reduces to a linear projection onto the dominant singular vector of the whitened state-transition matrix. Encoding this singular vector as a receptive field enables neurons to compute membership indices via the projection sign in a biologically plausible manner. Each neuron detects either a {\it predictive} coherent set (stimuli with common futures) or a {\it retrospective} coherent set (stimuli with common pasts), suggesting a functional dichotomy among neurons. Since neurons lack access to explicit dynamical equations, the requisite singular vectors must be estimated directly from data, for example, via past-future canonical correlation analysis on lag-vector representations--an approach that naturally extends to nonlinear dynamics. This framework provides a novel account of neuronal temporal filtering, the ubiquity of rectification in neural responses, and known functional dichotomies. Coherent-set clustering thus emerges as a fundamental computation underlying sensory processing and transferable to bio-inspired artificial systems.

artificial intelligence, name change, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.82)

Add feedback

Accelerating Power Method with Fast Sketching for Stronger Low-Rank Approximation

Chenakkod, Shabarish, Dereziński, Michał

arXiv.org Machine LearningMay-12-2026

The power method is one of the most fundamental tools for extracting top principal components from data through low-rank matrix approximation. Yet, when the target rank is large, the cost of matrix multiplication associated with this procedure becomes a major bottleneck. We develop an algorithmic and theoretical framework for accelerating the power method using fast sketching, which is a popular paradigm in randomized linear algebra. Our framework leads to simple and provably efficient methods for singular value decomposition, low-rank factorization, and Nyström approximation, which attain strong numerical performance on benchmark problems. The key novelty in our analysis is the use of regularized spectral approximation, a property of fast sketching methods which proves more flexible in generalizing power method guarantees than traditional arguments.

artificial intelligence, machine learning, power method, (15 more...)

arXiv.org Machine Learning

2605.09755

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Can we globally optimize cross validation loss in ridge regression

Neural Information Processing SystemsApr-27-2026, 03:01:02 GMT

Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Crossvalidation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors, and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.82)

Add feedback

02dd0db10c40092de3d9ec2508d12f60-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:00:14 GMT

artificial intelligence, machine learning, singular value, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.29)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Communications > Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

Learning better with Dale's Law: ASpectral Perspective

Neural Information Processing SystemsApr-24-2026, 06:00:10 GMT

Most recurrent neural networks (RNNs) do not include a fundamental constraint of real neural circuits: Dale's Law, which implies that neurons must be excitatory (E) or inhibitory (I). Dale's Law is generally absent from RNNs because simply partitioning a standard network's units into E and I populations impairs learning. However, here we extend a recent feedforward bio-inspired EI network architecture, named Dale's ANNs, to recurrent networks, and demonstrate that good performance is possible while respecting Dale's Law. This begs the question: What makes some forms of EI network learn poorly and others learn well? And, why does the simple approach of incorporating Dale's Law impair learning? Historically the answer was thought to be the sign constraints on EI network parameters, and this was a motivation behind Dale's ANNs. However, here we show the spectral properties of the recurrent weight matrix at initialisation are more impactful on network performance than sign constraints. We find that simple EI partitioning results in a singular value distribution that is multimodal and dispersed, whereas standard RNNs have an unimodal, more clustered singular value distribution, as do recurrent Dale's ANNs. We also show that the spectral properties and performance of partitioned EI networks are worse for small networks with fewer I units, and we present normalised SVD entropy as a measure of spectrum pathology that correlates with performance.

artificial intelligence, machine learning, spectral property, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: